<!DOCTYPE html>
<html lang="en">

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Treegion Scheduling</title>
<link rel="stylesheet" type="text/css" href="https://gcc.gnu.org/gcc.css" />
</head>

<body>
<h1>Treegion Scheduling</h1>

<h2>Table of Contents</h2>

<ul>
<li><a href="#news">Latest news</a></li>
<li><a href="#intro">Introduction</a></li>
<li><a href="#contributing">Contributing</a></li>
<li><a href="#readings">Readings</a></li>
</ul>

<h2 id="news">Latest News</h2>

<dl>
<dt>2005-02-03</dt>
<dd><p>Treegion formation with tail duplication based on ICSE checked in.</p></dd>
<dt>2005-12-23</dt>
<dd><p>Natural Treegion formation checked in.</p></dd>
<dt>2005-12-16</dt>
<dd><p>Creation of sched-treegion-branch.</p></dd>
</dl>

<h2 id="intro">Introduction</h2>
<p>
Instruction scheduling is a critical phase of compilation for extracting 
large amounts of ILP from a program. In this work we present the status 
of the implementation of an architecture-independent, aggressive global 
instruction scheduler based on Treegions. We also present a technique for 
efficiently tail duplicating treegions considering the tradeoff between 
codes size and ILP.  The implementation 
currently resides on the sched-treegion-branch.</p>

<h2 id="tree_form">Natural Treegion Formation</h2>

<p>
A treegion is a non-linear, single-entry, multiple-exit region of code 
containing basic blocks that constitute a subgraph of the control-flow-graph 
(CFG). Building large regions are critical for enabling a compiler to 
exploit large amounts of parallelism through speculation. Unlike other 
region formation algorithms, such as traces and superblocks, treegions take 
into account multiple execution paths, producing larger regions and more 
opportunities for speculation.  In addition, treegions do not require special 
architectural features (e.g., predication for Hyperblocks) for region formation.
GCC currently supports both linear regions, in the form of superblocks (tracer.c)
and extended basic blocks (sched-ebb.c), as well as non-linear regions 
(sched-rgn.c), which are limited to loop-free procedures and reducible inner 
loops.  Treegions have the advantage that unlike superblocks and extended basic 
blocks, treegion formation is based on the CFG and does not require profile 
information. Further, treegion formation typically results in much larger 
regions as compared to the current region formation implementation.
</p>

<p>
Natural treegion formation (i.e., treegions formed without tail duplication) 
begins at the entry point of a procedure, which forms the root of a new 
treegion. Starting at this root, the CFG is traversed and successor basic 
blocks are absorbed into the treegion if they are not a merge point (i.e., 
have multiple predecessor edges).  Once all possible blocks have been added 
to the treegion, the leaf nodes, all of which are merge points, are added to 
a saplings list.  These saplings form the roots of other treegions.  For each 
sapling the same process is applied until all basic blocks in the CFG have been 
consumed.  Treegion formation requires only a single pass over the CFG.  Treegions 
also simplify the calculation of dominators since all basic blocks dominate all 
successor blocks.
</p>

<h2 id="tail_dup">Tail Duplication</h2>

<p>
In this section we discuss our method for efficient tail duplication, with 
treegions being the unit of duplication.  Our heuristic considers the increase 
in code size relative to the increase in ILP from tail duplication.  This metric, 
referred to as the Instantaneous Code Size Efficiency (ICSE), is defined as 
the change in IPC relative to the increase in code size after tail duplication.
</p>

<p>
To calculate the ICSE of a tail duplicate candidate the IPC of a region must 
be known at compile time.  Since this information is not available we've 
developed a heuristic to calculate the estimated execution time of a treegion.
We define the estimated execution time of a multi-path treegion as the sum of 
the expected execution time of each path through treegion biased by the 
execution frequency of each path.  The execution frequency of each path is 
determined through profiling or other heuristics.  The expected execution time 
of any path is the maximum of the data dependence bound and the resource bound.
The data dependence bound is calculated as the height of the true data dependence 
bound of the data dependence graph (DDG) for a given treegion.  The resource 
bound is computed as the number of instructions in the treegion divided by the 
issue width of the target machine.
</p>

<p>
The tail duplication process begins by calculating the ICSE for each tail 
duplication candidate. Each control edge between a parent and child treegion 
is a potential candidate with the child being the target for duplication.  
After the ICSE has been calculated for each candidate, the best candidate is 
chosen for tail duplication.  After duplication the effected treegions are 
reformed and the candidate list is updated. Tail duplication continues until 
either no more candidates exist or no more candidates are above the ICSE threshold.
Discussion of this optimal threshold can be found in [1]. Under code-size or 
compile time constraints, treegion size may also be limited by the number of 
basic blocks and/or the number of instructions contained with in the treegion.
</p>

<h2 id="tree_sched">Treegion scheduling - Tree Traversal Scheduling</h2>

<p>
Due to the acyclic nature of treegions, the Haifa scheduler does not
require any modifications to schedule treegions.  We do however have a
number of enhancements currently in development to improve the performance 
of treegion scheduling.
</p>

<p>
The goal of Tree Traversal Scheduling (TTS) is to speedup every execution 
path through the treegion.  This is accomplished by prioritizing speculative
instructions from different paths which compete for limited resources. Profile 
information is used to prioritize the scheduling of basic blocks within a 
treegion.  Tree traversal scheduling consists of two steps (1) construction of 
the control/data dependence graph to perform instruction ordering and 
(2) scheduling of the instructions in the treegion.  
</p>

<p>
For each treegion instructions are prioritized based on a.) execution frequency, 
b.) exit count heuristic to resolve ties from (a), and c.) data dependence height 
to resolve ties from (b).  Heuristic (a) gives priority to the most frequently 
executed path.  Heuristic (b) gives priority to instructions that help more exits, 
which has the potential to make more speculative instructions ready for execution.
</p>

<p>
The algorithm for treegion scheduling is as follows: (1) For a treegion, sort the 
basic blocks according to a depth-first traversal order with the child block 
selected with the highest execution frequency.  (2) Begin list scheduling blocks 
at the root basic block.  (3) During the scheduling of a basic block, consider 
speculation for instructions dominated by this basic block. 
(4) Repeat step 3 until all basic blocks in the treegion have been scheduled.
</p>

<p>The implementation currently resides on the branch.</p>

<h2 id="contributing">Contributing</h2>

<p>Checkout the sched-treegion branch.</p>

<p>When posting to the development lists, please mark messages and patches 
with [sched-treegion] in the subject. The usual contribution and testing 
rules apply.  This branch is maintained by 
<a href="mailto:mcrosier@unity.ncsu.edu">Chad Rosier</a>.</p>

<h2 id="readings">Readings</h2>

<p>Lots of useful information is present at the <a
href="https://prod.tinker.cc.gatech.edu">TINKER Microarchitecture and
Compiler Research</a> homepage. More relevant papers:</p>

<dl>
<dt>[1]</dt>

<dd>
H. Zhou, and T.M. Conte, 
<a href="https://prod.tinker.cc.gatech.edu/symposia/interact02.pdf">
Code Size Efficiency in Global Scheduling for ILP Processors</a>,
Proceedings of the 6th Annual Workshop on the Interaction between Compilers 
and Computer Architectures (INTERACT-6), Cambridge, MA, February 2002.
</dd>

<dt>[2]</dt>

<dd>
H. Zhou, M. D. Jennings, and T. M. Conte,
<a href="https://prod.tinker.cc.gatech.edu/symposia/lcpc01.pdf">
Tree Traversal Scheduling: A Global Scheduling Technique for VLIW/EPIC 
Processors</a>, Proceedings of the 14th Annual Workshop on Languages and 
Compilers for Parallel Computing (LCPC'01), Cumberland Falls, KY, August 2001.
</dd>

<dt>[3]</dt>

<dd>
W. A. Havanki, S. Banerjia, and T. M. Conte,
<a href="https://prod.tinker.cc.gatech.edu/symposia/hpca4_treegions.pdf">
Treegion scheduling for wide-issue processors</a>, Proceedings of the 
4th International Symposium on High-Performance Computer Architecture 
(HPCA-4), Las Vegas, Feb. 1998.
</dd>

<dt>[4]</dt>

<dd>
S. Banerjia, W.A. Havanki, and T.M. Conte,
<a href="https://prod.tinker.cc.gatech.edu/symposia/europar97.pdf">
Treegion scheduling for highly parallel processors</a>, 
Proceedings of the 3rd International Euro-Par Conference (Euro-Par'97), 
Passau, Germany, pp.1074-1078, Aug. 1997.
</dd>
</dl>


</body>
</html>
